Farsi and Arabic document images lossy compression based on the mixed raster content model
Identifieur interne : 000A83 ( Main/Exploration ); précédent : 000A82; suivant : 000A84Farsi and Arabic document images lossy compression based on the mixed raster content model
Auteurs : Hadi Grailu [Iran] ; Mojtaba Lotfizad [Iran] ; Hadi Sadoghi-Yazdi [Iran]Source :
- International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2009.
Descripteurs français
- Pascal (Inist)
- Compression donnée, Compression image, Texte, Reconnaissance caractère, Reconnaissance optique caractère, Concordance forme, Traitement document, Arabe, Trame, Composé modèle, Théorie vitesse distorsion, Lisibilité, Modèle mixte, Modélisation, Segmentation, Optimisation, Artefact, Compression signal, Masque.
English descriptors
- KwdEn :
Abstract
Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability and OCR facility. Third, these methods usually suffer from the undesired jaggy artifact and misclassifying the important textual details. In this paper, MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of masklayer. It also uses a segmentation method which is sensitive enough to the small textual objects. Experimental results show that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75-2.3.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000190
- to stream PascalFrancis, to step Curation: 000587
- to stream PascalFrancis, to step Checkpoint: 000199
- to stream Main, to step Merge: 000A93
- to stream Main, to step Curation: 000A83
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Farsi and Arabic document images lossy compression based on the mixed raster content model</title>
<author><name sortKey="Grailu, Hadi" sort="Grailu, Hadi" uniqKey="Grailu H" first="Hadi" last="Grailu">Hadi Grailu</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Tarbiat Modares University</s1>
<s2>Tehran</s2>
<s3>IRN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Iran</country>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lotfizad, Mojtaba" sort="Lotfizad, Mojtaba" uniqKey="Lotfizad M" first="Mojtaba" last="Lotfizad">Mojtaba Lotfizad</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Tarbiat Modares University</s1>
<s2>Tehran</s2>
<s3>IRN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Iran</country>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Sadoghi Yazdi, Hadi" sort="Sadoghi Yazdi, Hadi" uniqKey="Sadoghi Yazdi H" first="Hadi" last="Sadoghi-Yazdi">Hadi Sadoghi-Yazdi</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Department of Computer Engineering, Ferdowsi University of Mashhad</s1>
<s2>Mashhad</s2>
<s3>IRN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Iran</country>
<wicri:noRegion>Mashhad</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0182404</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0182404 INIST</idno>
<idno type="RBID">Pascal:10-0182404</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000190</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000587</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000199</idno>
<idno type="wicri:doubleKey">1433-2833:2009:Grailu H:farsi:and:arabic</idno>
<idno type="wicri:Area/Main/Merge">000A93</idno>
<idno type="wicri:Area/Main/Curation">000A83</idno>
<idno type="wicri:Area/Main/Exploration">000A83</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Farsi and Arabic document images lossy compression based on the mixed raster content model</title>
<author><name sortKey="Grailu, Hadi" sort="Grailu, Hadi" uniqKey="Grailu H" first="Hadi" last="Grailu">Hadi Grailu</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Tarbiat Modares University</s1>
<s2>Tehran</s2>
<s3>IRN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Iran</country>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Lotfizad, Mojtaba" sort="Lotfizad, Mojtaba" uniqKey="Lotfizad M" first="Mojtaba" last="Lotfizad">Mojtaba Lotfizad</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Electrical Engineering, Tarbiat Modares University</s1>
<s2>Tehran</s2>
<s3>IRN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Iran</country>
<wicri:noRegion>Tehran</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Sadoghi Yazdi, Hadi" sort="Sadoghi Yazdi, Hadi" uniqKey="Sadoghi Yazdi H" first="Hadi" last="Sadoghi-Yazdi">Hadi Sadoghi-Yazdi</name>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>Department of Computer Engineering, Ferdowsi University of Mashhad</s1>
<s2>Mashhad</s2>
<s3>IRN</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Iran</country>
<wicri:noRegion>Mashhad</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Arabic</term>
<term>Artefact</term>
<term>Character recognition</term>
<term>Data compression</term>
<term>Document processing</term>
<term>Image compression</term>
<term>Legibility</term>
<term>Mask</term>
<term>Mixed model</term>
<term>Model compound</term>
<term>Modeling</term>
<term>Optical character recognition</term>
<term>Optimization</term>
<term>Pattern matching</term>
<term>Raster</term>
<term>Rate distortion theory</term>
<term>Segmentation</term>
<term>Signal compression</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Compression donnée</term>
<term>Compression image</term>
<term>Texte</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Traitement document</term>
<term>Arabe</term>
<term>Trame</term>
<term>Composé modèle</term>
<term>Théorie vitesse distorsion</term>
<term>Lisibilité</term>
<term>Modèle mixte</term>
<term>Modélisation</term>
<term>Segmentation</term>
<term>Optimisation</term>
<term>Artefact</term>
<term>Compression signal</term>
<term>Masque</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Recently, the mixed raster content model was proposed for compound document image compression. Most state-of-the-art document image compression methods, such as DjVu, work on the basis of this model but they have some disadvantages, especially for Farsi and Arabic document images. First, the Farsi/Arabic script has some characteristics which can be used to further improve the compression performance. Second, existing segmentation methods have focused on well-separating the textual objects from the background and/or optimizing the rate-distortion trade-off; nevertheless, they have not considered the text readability and OCR facility. Third, these methods usually suffer from the undesired jaggy artifact and misclassifying the important textual details. In this paper, MRC-based document image compression method is proposed which compromises rate-distortion trade-off better than the existing state-of-the-art document compression methods. The proposed method has higher performance in the aspects of segmentation, bi-level mask layer compression, OCR facility, and the overall compression. It uses a 1D pattern matching technique for compression of masklayer. It also uses a segmentation method which is sensitive enough to the small textual objects. Experimental results show that the proposed method has considerably higher compression performance than that of the state-of-the-art compression method DjVu, as high as 1.75-2.3.</div>
</front>
</TEI>
<affiliations><list><country><li>Iran</li>
</country>
</list>
<tree><country name="Iran"><noRegion><name sortKey="Grailu, Hadi" sort="Grailu, Hadi" uniqKey="Grailu H" first="Hadi" last="Grailu">Hadi Grailu</name>
</noRegion>
<name sortKey="Lotfizad, Mojtaba" sort="Lotfizad, Mojtaba" uniqKey="Lotfizad M" first="Mojtaba" last="Lotfizad">Mojtaba Lotfizad</name>
<name sortKey="Sadoghi Yazdi, Hadi" sort="Sadoghi Yazdi, Hadi" uniqKey="Sadoghi Yazdi H" first="Hadi" last="Sadoghi-Yazdi">Hadi Sadoghi-Yazdi</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A83 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A83 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:10-0182404 |texte= Farsi and Arabic document images lossy compression based on the mixed raster content model }}
This area was generated with Dilib version V0.6.32. |